Lazy Logging and Prefetch-Based Crash Recovery in Software Distributed Shared Memory Systems
نویسندگان
چکیده
In this paper, we propose a new, efficient logging protocol, called lazy logging, and a fast crash recovery protocol, called the prefetch-based crash recovery (PCR), for software distributed shared memory (SDSM). Our lazy logging protocol minimizes failure-free overhead by logging only data indispensable for correct recovery, while our PCR protocol reduces the recovery time by prefetching data according to the future memory access patterns, thus eliminating memory miss penalty during the recovery process. We have performed experiments on workstation clusters, comparing our protocols against the earlier reduced-stable logging (RSL) protocol by actually implementing both protocols in TreadMarks, a state-of-the-art SDSM system. The experimental results show that our lazy logging protocol consistently outperforms the RSL protocol. Our protocol increases the execution time slightly by 1% to 4% during failure-free execution, while the RSL protocol results in the execution time overhead of 6% to 21% due to its larger log size and higher disk access frequency. Our PCR protocol also outperforms the widely used simple crash recovery protocol by 18% to 57% under all applications examined.
منابع مشابه
An Efficient Logging Scheme for Lazy Release Consistent Distributed Shared Memory Systems
We propose a low-overhead logging scheme for the distributed shared memory system based on the lazy release consistent memory model. In the proposed scheme, stable logging is performed when a lock grant causes an actual dependency relation between the processes, which significantly reduces the logging frequency. Also, instead of making a stable log of the accessed data items, a process logs sta...
متن کاملAn efficient causal logging scheme for recoverable distributed shared memory systems
This paper presents a causal logging scheme for the lazy release consistent distributed shared memory systems. Causal logging is a very attractive approach to provide the fault tolerance for the distributed systems, since it eliminates the need of stable logging. However, since inter-process dependency must causally be transferred with the normal messages, the excessive message overhead has bee...
متن کاملReduced Overhead Logging for Rollback Recovery in Distributed Shared Memory
Rollback techniques that use message logging and deterministic replay can be used in parallel systems to recover a failed node without involving other nodes. Distributed shared memory (DSM) systems cannot directly apply message-passing logging techniques because they use inherently nondeterministic asynchronous communication. This paper presents new logging schemes that reduce the typically hig...
متن کاملCoherence-Centric Logging and Recovery for Home-Based Software Distributed Shared Memory
The probability of failures in software distributed shared memory (SDSM) increases as the system size grows. This paper introduces a new, efficient message logging technique, called the coherence-centric logging (CCL) and recovery protocol, for home-based SDSM. Our CCL minimizes failure-free overhead by logging only data necessary for correct recovery and tolerates high disk access latency by o...
متن کاملAn Efficient Logging Scheme for Lazy Release Consistent Distributed Shared Memory System
We propose a low-overhead logging scheme for the distributed shared memory system based on the lazy release consistent memory model. In the proposed scheme, stable logging is performed when a lock grant causes an actual dependency relation between the processes, which significantly reduces the logging frequency. Also, instead of making a stable log of the accessed data items, a process logs sta...
متن کامل